Unsupervised Stress Information Labeling Using Gaussian Process Latent Variable Model for Statistical Speech Synthesis

نویسندگان

Decha Moungsri

Tomoki Koriyama

Takao Kobayashi

چکیده

In Thai language, stress is an important prosodic feature that not only affects naturalness but also has a crucial role in meaning of phrase-level utterance. It is seen that a speech synthesis model that is trained with lack of stress and phrase-level information causes incorrect tones and ambiguity in meaning of synthetic speech. Our previous work has shown that manually annotated stress information improves naturalness of synthetic speech. However, a high time consumption is a drawback of the manual annotation. In this paper, we utilize an unsupervised learning technique called Bayesian Gaussian process latent variable model (Bayesian GP-LVM) to automatically put stress annotation on the given training data. Stress related features are projected onto a latent space in which syllables are easier classified into stressed/unstressed classes. We use the stressed/unstressed information as an additional context in GPR-based speech synthesis. Experimental results show that the proposed technique improves naturalness of synthetic speech as well as accuracy of stressed/unstressed classification. Moreover, the proposed technique enables us to avoid ambiguity in meaning of synthetic speech by providing intended stress position into context label sequence to be synthesized.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Tone modeling using Gaussian process latent variable model for statistical speech synthesis

In continuous speech of Thai language, tone pronunciation is affected by several factors. One of significant factors is stress that causes a diversity of F0 contours of tone, and affects syllable durations. Our previous studies have shown that a stressed/unstressed syllable context improves tone modeling accuracy. However, the stress in Thai language is generally unknown for a given input text ...

متن کامل

Speech-Driven Facial Animation Using a Shared Gaussian Process Latent Variable Model

In this work, synthesis of facial animation is done by modelling the mapping between facial motion and speech using the shared Gaussian process latent variable model. Both data are processed separately and subsequently coupled together to yield a shared latent space. This method allows coarticulation to be modelled by having a dynamical model on the latent space. Synthesis of novel animation is...

متن کامل

Gaussian Process Latent Random Field

The Gaussian process latent variable model (GPLVM) is an unsupervised probabilistic model for nonlinear dimensionality reduction. A supervised extension, called discriminative GPLVM (DGPLVM), incorporates supervisory information into GPLVM to enhance the classification performance. However, its limitation of the latent space dimensionality to at most C − 1 (C is the number of classes) leads to ...

متن کامل

Stochastic Variational Inference for Gaussian Process Latent Variable Models using Back Constraints

Gaussian process latent variable models (GPLVMs) are a probabilistic approach to modelling data that employs Gaussian process mapping from latent variables to observations. This paper revisits a recently proposed variational inference technique for GPLVMs and methodologically analyses the optimality and different parameterisations of the variational approximation. We investigate a structured va...

متن کامل

Gaussian Process Latent Variable Alignment Learning

We present a model that can automatically learn alignments between high-dimensional data in an unsupervised manner. Learning alignments is an ill-constrained problem as there are many different ways of defining a good alignment. Our proposed method casts alignment learning in a framework where both alignment and data are modelled simultaneously. We derive a probabilistic model built on non-para...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2016

Unsupervised Stress Information Labeling Using Gaussian Process Latent Variable Model for Statistical Speech Synthesis

نویسندگان

چکیده

منابع مشابه

Tone modeling using Gaussian process latent variable model for statistical speech synthesis

Speech-Driven Facial Animation Using a Shared Gaussian Process Latent Variable Model

Gaussian Process Latent Random Field

Stochastic Variational Inference for Gaussian Process Latent Variable Models using Back Constraints

Gaussian Process Latent Variable Alignment Learning

عنوان ژورنال:

اشتراک گذاری